Print optimal number of maPCA components and plot optimization curves #839

eurunuela · 2022-02-08T19:14:39Z

Closes #834 .

Changes proposed in this pull request:

Gets the optimal number of components and optimization curves for all three criteria in maPCA.
Plots the optimization curves of all three criteria in maPCA.
Prints the optimal number of components given by all three criteria to the logs.
Print and plot optimal number of components given by 90% and 95% of variance explained when maPCA returns enough components.

Edit: this draft PR will fail until #47 and #48 are merged in maPCA.

eurunuela · 2022-02-10T15:27:56Z

@handwerkerd I don't know what you want the PCA_cross_component_metrics file to contain exactly, so I am willing to leave that to you. What do you think of the changes I've made so far in this PR?

handwerkerd · 2022-02-11T21:10:49Z

I was hoping to read this carefully today, but it's not going to happen. I'll try to give a closer read soon. When I was discussing, PCA_cross_component_metrics it was in reference to cross_component_metrics that I've added to the component selection class in #756 Here is one example where it is used:

tedana/tedana/selection/selection_nodes.py

Lines 647 to 652 in f011ba2

    
           outputs["kappa_elbow_kundu"] = kappa_elbow_kundu( 
        
               component_table, selector.n_echos 
        
           ) 
        
           selector.cross_component_metrics["kappa_elbow_kundu"] = outputs[ 
        
               "kappa_elbow_kundu" 
        
           ]

The basic idea is to have a dictionary for all values that are calculated based on metrics across components (i.e. the kappa and rho elbows). The complements the component table, where each metric has a value for each component. In the current code, these cross component values are either in the log and nowhere else or not saved at all. Since you're criterion thresholds are the same general concept, I figured you could create a similar dictionary with a similar naming style.

codecov · 2022-02-22T16:32:14Z

Codecov Report

Merging #839 (52ffad8) into main (d4406e4) will increase coverage by 0.16%.
The diff coverage is 98.64%.

@@            Coverage Diff             @@
##             main     #839      +/-   ##
==========================================
+ Coverage   93.15%   93.31%   +0.16%     
==========================================
  Files          27       27              
  Lines        2234     2304      +70     
==========================================
+ Hits         2081     2150      +69     
- Misses        153      154       +1

Impacted Files	Coverage Δ
tedana/io.py	`94.03% <75.00%> (-0.36%)`	⬇️
tedana/decomposition/pca.py	`90.16% <100.00%> (+3.64%)`	⬆️
tedana/reporting/__init__.py	`100.00% <100.00%> (ø)`
tedana/reporting/static_figures.py	`98.79% <100.00%> (+0.29%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4406e4...52ffad8. Read the comment docs.

eurunuela · 2022-02-25T10:13:28Z

The three-echo test should pass once we merge PR #50 in maPCA.

eurunuela · 2022-03-16T11:01:57Z

The test will pass once PR #50 is merged in maPCA.

handwerkerd · 2022-03-16T19:19:10Z

I just tried to locally run the three-echo test with mapca after #50 was merged. The test is still failing in io.py with

Object of type ndarray is not JSON serializable
  File "[/Users/handwerkerd/code/tedana_community/me-ica/tedana/tedana/io.py]()", line 253, in save_json
    json.dump(data, fo, indent=4, sort_keys=True)
  File "[/Users/handwerkerd/code/tedana_community/me-ica/tedana/tedana/io.py]()", line 201, in save_file
    self.save_json(data, name)
  File "[/Users/handwerkerd/code/tedana_community/me-ica/tedana/tedana/decomposition/pca.py]()", line 322, in tedpca
    io_generator.save_file(mapca_results, "PCA cross component metrics json")
  File "[/Users/handwerkerd/code/tedana_community/me-ica/tedana/tedana/workflows/tedana.py]()", line 638, in tedana_workflow
    dd, n_components = decomposition.tedpca(
  File "[/Users/handwerkerd/code/tedana_community/me-ica/tedana/tedana/tests/test_integration.py]()", line 181, in test_integration_three_echo (Current frame)
    tedana_cli.tedana_workflow(

The four-echo test is passing so this isn't a universal issue. I'm not sure I have time to dig into this more right now, but wanted to share.

tsalo · 2022-03-16T19:20:27Z

I think we need to automatically convert numpy arrays to lists in save_json.

jbteves · 2022-05-09T18:29:47Z

tedana/decomposition/pca.py

+        voxel_comp_weights = ma_pca.u_
+        varex = ma_pca.explained_variance_
+        varex_norm = ma_pca.explained_variance_ratio_
+        comp_ts = ma_pca.components_.T


This looks like we're accessing private components of the object. Could we either link to some documentation about the returned object or elaborate on what we're accessing here?

The trailing underscore indicates that the attribute is estimated from the data (via fit), rather than that it's private. We adopted this convention from scikit-learn.

Ah, that makes sense. Sorry had a momentary flip in my brain, where I put the underscore on the wrong side. I do think it's worth a quick reference to the docs.

jbteves · 2022-05-09T18:31:08Z

tedana/info.py

@@ -29,7 +29,7 @@

 REQUIRES = [
    "bokeh<2.3.0",
-    "mapca~=0.0.1",
+    "mapca>=0.0.2",


Unfortunately we'll need to resolve the conflict with main where we've moved everything into a different setup organization (in particular, setup.cfg in the install_requires section).

jbteves · 2022-05-09T18:32:56Z

tedana/reporting/static_figures.py

@@ -288,3 +288,144 @@ def comp_figures(ts, mask, comptable, mmix, io_generator, png_cmap):
        compplot_name = os.path.join(io_generator.out_dir, "figures", plot_name)
        plt.savefig(compplot_name)
        plt.close()
+
+
+def pca_results(criteria, n_components, all_varex, io_generator):


This function has a large amount of repetition. Could we break it into smaller functions, parametrized perhaps by something like the input data and the label only, and if matplotlib requires this, the figure itself?

jbteves · 2022-05-09T18:34:25Z

Hi Eneko, I left some comments but didn't formally review. Let me know if you'd like for me to fix the merge conflicts and some of the refactors I mentioned as possibilities for you. However I do think it's worth considering some more documentation about how we're extracting these values from the mapca object.

eurunuela · 2022-05-10T18:08:01Z

Hi Eneko, I left some comments but didn't formally review. Let me know if you'd like for me to fix the merge conflicts and some of the refactors I mentioned as possibilities for you. However I do think it's worth considering some more documentation about how we're extracting these values from the mapca object.

Feel free to make the changes. I won't be able to work on this until I come back from ISMRM.

Thank you for the comments by the way!

Resolves conflicts: - info.py: deletes and moves dependency version to setup.cfg - cornell_three_echo_outputs: adds pca_criteria.png, keeps main outputs

jbteves · 2022-05-11T19:52:07Z

Hm, I merged the changes but it seems like some expected attribute is missing. It looks like CircleCI is just caching the last environment, so that it's not checking the versions in setup.cfg to update them appropriately, or perhaps some other error I'm missing. Any thoughts @eurunuela @tsalo ?

jbteves · 2022-05-13T15:38:41Z

@eurunuela I think you need to cut a new release with the changes so that we can pin that release.

eurunuela · 2022-05-16T11:34:49Z

@eurunuela I think you need to cut a new release with the changes so that we can pin that release.

Done!

jbteves

LGTM, thanks @eurunuela !

handwerkerd

Mostly LGTM.
The one thing I notices is that a file called ./figures/pca_variance_explained.png It looks similar to the relevant file ./figures/pca_criteria.png but it just contains the vertical dashed lines without the criteria curves. My guess is that this was useful while working on the code, but isn't useful as a saved output. If I'm right, just don't save that file?

eurunuela · 2022-05-16T19:51:41Z

Mostly LGTM. The one thing I notices is that a file called ./figures/pca_variance_explained.png It looks similar to the relevant file ./figures/pca_criteria.png but it just contains the vertical dashed lines without the criteria curves. My guess is that this was useful while working on the code, but isn't useful as a saved output. If I'm right, just don't save that file?

I'm sorry @handwerkerd, that was a mistake on my end. Here's what the figure should look like:

I'm committing the fix now.

handwerkerd · 2022-05-16T20:11:53Z

Yay! That does look useful and I'm glad I noticed. In your sample figure above, I don't see a blue dashed line for AIC.

eurunuela · 2022-05-16T20:15:04Z

Yay! That does look useful and I'm glad I noticed. In your sample figure above, I don't see a blue dashed line for AIC.

That's because it's right behind the KIC line. The figure was generated with the 3 echo integration test by the way.

handwerkerd

LGTM

jbteves

LGTM

eurunuela added 4 commits February 8, 2022 20:11

Retrieve optimal number of PCA components and plot optimization curves

f6f5438

Updated to retrieve variance explained criteria from maPCA

524903e

Added prints for PCA info and tested the plot gets generated correctly

2f004e8

Trigger tests

3f162cc

Increase minimum maPCA version

0348d7c

Merge remote-tracking branch 'upstream/main' into enh/pca_ncomps

a199ecf

eurunuela mentioned this pull request Feb 22, 2022

Made AIC the default maPCA option #849

Merged

Updated expected results for three-echo test

5e48e81

eurunuela added 2 commits February 25, 2022 11:05

Generate variance explained plot and save cross component metrics

2cd50ec

Update __init__.py

0efdf7a

eurunuela added 2 commits March 16, 2022 11:15

Save maPCA results into dictionary

66cad57

Update minimum maPCA version required

07b5d2c

eurunuela marked this pull request as ready for review March 16, 2022 10:53

eurunuela mentioned this pull request Mar 16, 2022

Return variance explained for all components ME-ICA/mapca#50

Merged

eurunuela mentioned this pull request Mar 17, 2022

Object of type ndarray is not JSON serializable #858

Closed

jbteves reviewed May 9, 2022

View reviewed changes

Merge branch 'main' into enh/pca_ncomps

e361e16

Resolves conflicts: - info.py: deletes and moves dependency version to setup.cfg - cornell_three_echo_outputs: adds pca_criteria.png, keeps main outputs

Fixes numpy int issues

81259cb

Joshua Teves added 2 commits May 16, 2022 09:52

Pins later mapca version

8c96560

Fix bad merge

e4cc27d

jbteves previously approved these changes May 16, 2022

View reviewed changes

handwerkerd self-requested a review May 16, 2022 19:21

handwerkerd requested changes May 16, 2022

View reviewed changes

Fix explained variance figure

09aa092

eurunuela dismissed jbteves’s stale review via 09aa092 May 16, 2022 19:52

eurunuela requested review from handwerkerd and jbteves May 16, 2022 19:52

Removed breakpoint

52ffad8

handwerkerd approved these changes May 17, 2022

View reviewed changes

jbteves approved these changes May 17, 2022

View reviewed changes

eurunuela merged commit fc61dec into ME-ICA:main May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Print optimal number of maPCA components and plot optimization curves #839

Print optimal number of maPCA components and plot optimization curves #839

eurunuela commented Feb 8, 2022 •

edited

Loading

eurunuela commented Feb 10, 2022

handwerkerd commented Feb 11, 2022

codecov bot commented Feb 22, 2022 •

edited

Loading

eurunuela commented Feb 25, 2022

eurunuela commented Mar 16, 2022

handwerkerd commented Mar 16, 2022

tsalo commented Mar 16, 2022

jbteves May 9, 2022

tsalo May 9, 2022

jbteves May 9, 2022

jbteves May 9, 2022

jbteves May 9, 2022

jbteves commented May 9, 2022

eurunuela commented May 10, 2022

jbteves commented May 11, 2022

jbteves commented May 13, 2022

eurunuela commented May 16, 2022

jbteves left a comment

handwerkerd left a comment

eurunuela commented May 16, 2022

handwerkerd commented May 16, 2022

eurunuela commented May 16, 2022

handwerkerd left a comment

jbteves left a comment

Print optimal number of maPCA components and plot optimization curves #839

Print optimal number of maPCA components and plot optimization curves #839

Conversation

eurunuela commented Feb 8, 2022 • edited Loading

eurunuela commented Feb 10, 2022

handwerkerd commented Feb 11, 2022

codecov bot commented Feb 22, 2022 • edited Loading

Codecov Report

eurunuela commented Feb 25, 2022

eurunuela commented Mar 16, 2022

handwerkerd commented Mar 16, 2022

tsalo commented Mar 16, 2022

jbteves May 9, 2022

Choose a reason for hiding this comment

tsalo May 9, 2022

Choose a reason for hiding this comment

jbteves May 9, 2022

Choose a reason for hiding this comment

jbteves May 9, 2022

Choose a reason for hiding this comment

jbteves May 9, 2022

Choose a reason for hiding this comment

jbteves commented May 9, 2022

eurunuela commented May 10, 2022

jbteves commented May 11, 2022

jbteves commented May 13, 2022

eurunuela commented May 16, 2022

jbteves left a comment

Choose a reason for hiding this comment

handwerkerd left a comment

Choose a reason for hiding this comment

eurunuela commented May 16, 2022

handwerkerd commented May 16, 2022

eurunuela commented May 16, 2022

handwerkerd left a comment

Choose a reason for hiding this comment

jbteves left a comment

Choose a reason for hiding this comment

eurunuela commented Feb 8, 2022 •

edited

Loading

codecov bot commented Feb 22, 2022 •

edited

Loading